Index Compression and Redundancy Elimination in Large Textual Collections

نویسنده

  • Hao Yan
چکیده

vii

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Lossless Reversible Transformation Algorithms to Enhance Data Compression

In this paper we analyze and present the benefits offered in the lossless compression by applying a choice of preprocessing methods that exploits the advantage of redundancy of the source file. Textual data holds a number of properties that can be taken into account in order to improve compression. Pre-processing cope up with these properties by applying a number of transformations that make th...

متن کامل

Fast Relative Lempel-Ziv Self-index for Similar Sequences

Recent advances in biotechnology and web technology are generating huge collections of similar strings. People now face the problem of storing them compactly while supporting fast pattern searching. One compression scheme called relative Lempel-Ziv compression uses textual substitutions from a reference text as follows: Given a (large) set S of strings, represent each string in S as a concatena...

متن کامل

Redundancy Elimination Within Large Collections of Files

Ongoing advancements in technology lead to everincreasing storage capacities. In spite of this, optimizing storage usage can still provide rich dividends. Several techniques based on delta-encoding and duplicate block suppression have been shown to reduce storage overheads, with varying requirements for resources such as computation and memory. We propose a new scheme for storage reduction that...

متن کامل

Leveraging naturally distributed data redundancy to optimize collective replication

Dumping large amounts of related data simultaneously to local storage devices instead of a parallel file system is a frequent I/O pattern of HPC applications running at large scale. Since local storage resources are prone to failures and have limited potential to serve multiple requests in parallel, techniques such as replication are often used to enable resilience and high availability. Howeve...

متن کامل

Using the Web to Reduce Data Sparseness in Pattern-Based Information Extraction

Textual patterns have been used effectively to extract information from large text collections. However they rely heavily on textual redundancy in the sense that facts have to be mentioned in a similar manner in order to be generalized to a textual pattern. Data sparseness thus becomes a problem when trying to extract information from hardly redundant sources like corporate intranets, encyclope...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010